Nonlinear Dimensionality Reduction by Maximum Variance Unfolding
نویسندگان
چکیده
Many problems in AI are simplified by clever representations of sensory or symbolic input. How to discover such representations automatically, from large amounts of unlabeled data, remains a fundamental challenge. The goal of statistical methods for dimensionality reduction is to detect and discover low dimensional structure in high dimensional data. In this paper, we review a recently proposed algorithm— maximum variance unfolding—for learning faithful low dimensional representations of high dimensional data. The algorithm relies on modern tools in convex optimization that are proving increasingly useful in many areas of machine learning. Introduction A fundamental challenge of AI is to develop useful internal representations of the external world. The human brain excels at extracting small numbers of relevant features from large amounts of sensory data. Consider, for example, how we perceive a familiar face. A friendly smile or a menacing glare can be discerned in an instant and described by a few well chosen words. On the other hand, the digital representations of these images may consist of hundreds or thousands of pixels. Clearly, there are much more compact representations of images, sounds, and text than their native digital formats. With such representations in mind, we have spent the last few years studying the problem of dimensionality reduction—how to detect and discover low dimensional structure in high dimensional data. For higher-level decision-making in AI, the right representation makes all the difference. We mean this quite literally, in the sense that proper judgments of similiarity and difference depend crucially on our internal representations of the external world. Consider, for example, the images of teapots in Fig. 1. Each image shows the same teapot from a different angle. Compared on a pixel-by-pixel basis, the query image and image A are the most similar pair of images; that is, their pixel intensities have the smallest meansquared-difference. The viewing angle in image B, however, is much closer to the viewing angle in the query image— evidence that distances in pixel space do not support crucial Copyright c© 2006, American Association for Artificial Intelligence (www.aaai.org). All rights reserved. query A B Figure 1: Images of teapots: pixel distances versus perceptual distances. As measured by the mean-squared-difference of pixel intensities, image A is closer to the query image than image B, despite the fact that the view in image A involves a full 180 degrees of rotation. judgments of similarity and difference. (Consider the embarrassment when your robotic butler grabs the teapot by its spout rather than its handle, not to mention the liability when it subsequently attempts to refill your guest’s cup.) A more useful representation of these images would index them by the teapot’s angle of rotation, thus locating image B closer to the query image than image A. Objects may be similar or different in many ways. In the teapot example of Fig. 1, there is only one degree of freedom: the angle of rotation. More generally, there may be many criteria that are relevant to judgments of similarity and difference, each associated with its own degree of freedom. These degrees of freedom are manifested over time by variabilities in appearance or presentation. The most important modes of variability can often be distilled by automatic procedures that have access to large numbers of observations. In essence, this is the goal of statistical methods for dimensionality reduction (Burges 2005; Saul et al. 2006). The observations, initially represented as high dimensional vectors, are mapped into a lower dimensional space. If this mapping is done faithfully, then the axes of the lower dimensional space relate to the data’s intrinsic degrees of freedom. The linear method of principal components analysis (PCA) performs this mapping by projecting high dimensional data into low dimensional subspaces. The principal subspaces of PCA have the property that they maximize the variance of the projected data. PCA works well if the most important modes of variability are approximately linear. In this case, the high dimensional observations can be very well (a) Subspace methods Center the data by subtracting out the mean image, then compute the Gram matrix of the centered images. Plot the eigenvalues of the Gram matrix in descending order. How many leading dimensions of the data are needed to account for 95% of the data’s total variance? Submit your source code as part of your solutions. (b) Manifold learning Use a manifold learn ng algorithm of your choic to compute a two dimensional embedding of these images. By visualizing your results in the plane, compare the two dimensional embeddings obtained from subspace methods versus manifold learning.
منابع مشابه
On a Connection between Maximum Variance Unfolding, Shortest Path Problems and IsoMap
We present an equivalent formulation of the Maximum Variance Unfolding (MVU) approach to nonlinear dimensionality reduction in terms of distance matrices. This yields a novel interpretation of the MVU problem as a regularized version of the shortest path problem on a graph. This interpretation enables us to establish an asymptotic convergence result for the case that the underlying data are dra...
متن کاملAn Introduction to Nonlinear Dimensionality Reduction by Maximum Variance Unfolding
Many problems in AI are simplified by clever representations of sensory or symbolic input. How to discover such representations automatically, from large amounts of unlabeled data, remains a fundamental challenge. The goal of statistical methods for dimensionality reduction is to detect and discover low dimensional structure in high dimensional data. In this paper, we review a recently proposed...
متن کاملOn the convergence of maximum variance unfolding
MaximumVariance Unfolding is one of the main methods for (nonlinear) dimensionality reduction. We study its large sample limit, providing specific rates of convergence under standard assumptions. We find that it is consistent when the underlying submanifold is isometric to a convex subset, and we provide some simple examples where it fails to be consistent.
متن کاملSpectral Dimensionality Reduction via Maximum Entropy
We introduce a new perspective on spectral dimensionality reduction which views these methods as Gaussian random fields (GRFs). Our unifying perspective is based on the maximum entropy principle which is in turn inspired by maximum variance unfolding. The resulting probabilistic models are based on GRFs. The resulting model is a nonlinear generalization of principal component analysis. We show ...
متن کاملA Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models
We introduce a new perspective on spectral dimensionality reduction which views these methods as Gaussian Markov random fields (GRFs). Our unifying perspective is based on the maximum entropy principle which is in turn inspired by maximum variance unfolding. The resulting model, which we call maximum entropy unfolding (MEU) is a nonlinear generalization of principal component analysis. We relat...
متن کاملColored Maximum Variance Unfolding
Maximum variance unfolding (MVU) is an effective heuristic for dimensionality reduction. It produces a low-dimensional representation of the data by maximizing the variance of their embeddings while preserving the local distances of the original data. We show that MVU also optimizes a statistical dependence measure which aims to retain the identity of individual observations under the distancep...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009